Using Reinforcement Learning to Spider the Web Efficiently
نویسندگان
چکیده
Consider the task of exploring the Web in order to find pages of a particular kind or on a particular topic. This task arises in the construction of search engines and Web knowledge bases. This paper argues that the creation of efficient web spiders is best framed and solved by reinforcement learning, a branch of machine learning that concerns itself with optimal sequential decision making. One strength of reinforcement learning is that it provides a formalism for measuring the utility of actions that give benefit only in the future. We present an algorithm for learning a value function that maps hyperlinks to future discounted reward using a naive Bayes text classifier. Experiments on two real-world spidering tasks show a threefold improvement in spidering efficiency over traditional breadth-first search, and up to a two-fold improvement over reinforcement learning with immediate reward only.
منابع مشابه
Expanding Reinforcement Learning Approaches for Efficient Crawling the Web
The amount of accessible information on World Wide Web is increasing rapidly, so that a general-purpose search engine cannot index everything on the Web. Focused crawlers have been proposed as a potential approach to overcome the coverage problem of search engines by limiting the domain of concentration of them. Focused crawling is a technique which is able to crawl particular topical portions ...
متن کاملRRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کاملUsing Reinforcement Learning to Spider the Web Eeciently
Consider the task of exploring the Web in order to nd pages of a particular kind or on a particular topic. This task arises in the construction of search engines and Web knowledge bases. This paper argues that the creation of eecient web spiders is best framed and solved by reinforcement learning, a branch of machine learning that concerns itself with optimal sequential decision making. One str...
متن کاملWeb pages ranking algorithm based on reinforcement learning and user feedback
The main challenge of a search engine is ranking web documents to provide the best response to a user`s query. Despite the huge number of the extracted results for user`s query, only a small number of the first results are examined by users; therefore, the insertion of the related results in the first ranks is of great importance. In this paper, a ranking algorithm based on the reinforcement le...
متن کاملReinforcement Learning in Neural Networks: A Survey
In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999